Encoding of video cross-fades using weighted prediction

ABSTRACT

A video encoder and method are provided for encoding video signal data for at least one cross-fade picture disposed between a fade-out start picture and a fade-in end picture, where the encoder portion includes a reference picture weighting factor unit for assigning weighting factors corresponding to each of the fade-out start picture and the fade-in end picture, respectively, and the method for encoding cross-fades between pictures includes identifying pictures between which a cross-fade is desired, determining appropriate end-points for the cross-fade, and encoding the end-points prior to encoding the cross-fade picture.

CROSS-REFERENCE TO RELATED APPLICATION

This application claims the benefit of U.S. Provisional Application Ser.No. 60/430,793 (Attorney Docket No. PU020487), filed Dec. 4, 2002 andentitled “ENCODING OF VIDEO CROSS-FADES USING WEIGHTED PREDICTION”,which is incorporated herein by reference in its entirety.

FIELD OF THE INVENTION

The present invention is directed towards video encoders, and inparticular, towards an apparatus and method for effectively producingvideo cross-fades between pictures.

BACKGROUND OF THE INVENTION

Video data is generally processed and transferred in the form of bitstreams. Typical video compression coders and decoders (“CODECs”) gainmuch of their compression efficiency by forming a reference pictureprediction of a picture to be encoded, and encoding the differencebetween the current picture and the prediction. The more closely thatthe prediction is correlated with the current picture, the fewer bitsthat are needed to compress that picture, thereby increasing theefficiency of the process. Thus, it is desirable for the best possiblereference picture prediction to be formed.

In many video compression standards, including Moving Picture ExpertsGroup (“MPEG”)-1, MPEG-2 and MPEG-4, a motion compensated version of aprevious reference picture is used as a prediction for the currentpicture, and only the difference between the current picture and theprediction is coded. When a single picture prediction (“P” picture) isused, the reference picture is not scaled when the motion compensatedprediction is formed. When bi-directional picture predictions (“B”pictures) are used, intermediate predictions are formed from twodifferent pictures, and then the two intermediate predictions areaveraged together, using equal weighting factors of (½, ½) for each, toform a single averaged prediction.

In some video sequences, in particular those with fades, the currentpicture to be coded is more strongly correlated to the reference picturescaled by a weighting factor than to the reference picture itself. TheJoint Video Team (“JVT”) video compression standard allows weightingfactors and offsets to be sent for each reference picture. The standardspecifies how the decoder will use the weighting factors, but it doesnot specify how an encoder might determine an appropriate weightingfactor. For sequences that include cross-fades, determining theappropriate weighting factors and reference pictures to use is quitedifficult.

SUMMARY OF THE INVENTION

These and other drawbacks and disadvantages of the prior art areaddressed by an apparatus and method that efficiently compress videocross-fades using JVT weighted prediction. The end-points of across-fade are determined and used as reference pictures for encodingpictures in the cross-fade region.

An apparatus and method are provided for encoding video signal data fora cross-fade picture disposed between a fade-out or start picture and afade-in or end picture, where the encoder portion includes a referencepicture weighting factor unit for assigning weighting factorscorresponding to each of the fade-out start picture and the fade-in endpicture, respectively, and the method for encoding cross-fades betweenpictures includes identifying pictures between which a cross-fade isdesired, determining appropriate end-points for the cross-fade, andencoding the end-points prior to encoding the cross-fade picture.

These and other aspects, features and advantages of the presentdisclosure will become apparent from the following description ofexemplary embodiments, which is to be read in connection with theaccompanying drawings.

BRIEF DESCRIPTION OF THE DRAWINGS

The present invention may be better understood with reference to thefollowing exemplary figures, in which:

FIG. 1 shows a block diagram for a standard video encoder;

FIG. 2 shows a block diagram for a video encoder with implicit referencepicture weighting for video cross-fades;

FIG. 3 shows a block diagram for a video encoder with explicit referencepicture weighting for video cross-fades;

FIG. 4 shows a block diagram for a video decoder with explicit referencepicture weighting for video cross-fades;

FIG. 5 shows a pictorial representation of a video cross-fade between apair of pictures; and

FIG. 6 shows a flowchart for an exemplary encoding process.

DETAILED DESCRIPTION OF PREFERRED EMBODIMENTS

An apparatus and method are disclosed for encoding of video cross-fadesusing weighted prediction, including motion vector estimation andadaptive reference picture weighting factor assignment. In some videosequences, in particular those with fading, the current picture or imageblock to be coded is more strongly correlated to a reference picturescaled by a weighting factor than to the reference picture itself. Videoencoders without weighting factors applied to reference pictures encodefading sequences very inefficiently. When weighting factors are used inencoding, a video encoder needs to determine both weighting factors andmotion vectors, but the best choice for each of these depends on theother.

Hence, a method is described to efficiently compress video cross-fadesusing JVT weighted prediction. The end-points of a cross-fade are firstdetermined and used as the reference pictures for encoding the picturesin the cross-fade region.

The present description illustrates the principles of the invention. Itwill thus be appreciated that those skilled in the art will be able todevise various arrangements that, although not explicitly described orshown herein, embody the principles of the invention and are includedwithin its spirit and scope.

All examples and conditional language recited herein are intended forpedagogical purposes to aid the reader in understanding the principlesof the invention and the concepts contributed by the inventor tofurthering the art, and are to be construed as being without limitationto such specifically recited examples and conditions.

Moreover, all statements herein reciting principles, aspects, andembodiments of the invention, as well as specific examples thereof, areintended to encompass both structural and functional equivalentsthereof. Additionally, it is intended that such equivalents include bothcurrently known equivalents as well as equivalents developed in thefuture, i.e., any elements developed that perform the same function,regardless of structure.

Thus, for example, it will be appreciated by those skilled in the artthat the block diagrams herein represent conceptual views ofillustrative circuitry embodying the principles of the invention.Similarly, it will be appreciated that any flow charts, flow diagrams,state transition diagrams, pseudocode, and the like represent variousprocesses which may be substantially represented in computer readablemedia and so executed by a computer or processor, whether or not suchcomputer or processor is explicitly shown.

The functions of the various elements shown in the figures may beprovided through the use of dedicated hardware as well as hardwarecapable of executing software in association with appropriate software.When provided by a processor, the functions may be provided by a singlededicated processor, by a single shared processor, or by a plurality ofindividual processors, some of which may be shared. Moreover, explicituse of the term “processor” or “controller” should not be construed torefer exclusively to hardware capable of executing software, and mayimplicitly include, without limitation, digital signal processor (“DSP”)hardware, read-only memory (“ROM”) for storing software, random accessmemory (“RAM”), and non-volatile storage.

Other hardware, conventional and/or custom, may also be included.Similarly, any switches shown in the figures are conceptual only. Theirfunction may be carried out through the operation of program logic,through dedicated logic, through the interaction of program control anddedicated logic, or even manually, the particular technique beingselectable by the implementer as more specifically understood from thecontext.

In the claims hereof, any element expressed as a means for performing aspecified function is intended to encompass any way of performing thatfunction including, for example, a) a combination of circuit elementsthat performs that function or b) software in any form, including,therefore, firmware, microcode or the like, combined with appropriatecircuitry for executing that software to perform the function. Theinvention as defined by such claims resides in the fact that thefunctionalities provided by the various recited means are combined andbrought together in the manner which the claims call for. Applicant thusregards any means that can provide those functionalities as equivalentto those shown herein.

In some video sequences, in particular those with fading, the currentpicture or image block to be coded is more strongly correlated to areference picture scaled by a weighting factor than to the referencepicture itself. Video encoders without weighting factors applied toreference pictures encode fading sequences very inefficiently.

In the Joint Video Team (“JVT”) video compression standard, each Ppicture can use multiple reference pictures to form a picture'sprediction, but each individual macroblock or macroblock partition (ofsize 16×8, 8×16, or 8×8) uses only a single reference picture forprediction. In addition to coding and transmitting the motion vectors, areference picture index is transmitted for each macroblock or macroblockpartition, indicating which reference picture is used. A limited set ofpossible reference pictures is stored at both the encoder and decoder,and the number of allowable reference pictures is transmitted. Unlike inprevious standards, such as MPEG-2, a JVT encoder has considerableflexibility in that previously coded pictures can be used as referencepictures.

In the JVT standard for bi-predictive pictures (also called “B”pictures), two predictors are formed for each macroblock or macroblockpartition, each of which can be from a separate reference picture, andthe two predictors are averaged together to form a single averagedpredictor. For bi-predictively coded motion blocks, the referencepictures can both be from the forward direction, both be from thebackward direction, or one each from the forward and backwarddirections.

Two lists are maintained of the available reference pictures that may beused for prediction. The two reference pictures are referred to as theList 0 and List 1 predictors. An index for each reference picture iscoded and transmitted, ref_idx_I0 and ref_idx_I1, for the List 0 andList 1 reference pictures, respectively.

The JVT standard provides two modes of weighted prediction, which allowsweighting factors and/or offsets to be applied to reference pictureswhen forming a prediction. The weighting factor to be used is based onthe reference picture index (or indices in the case of bi-prediction)for the current macroblock or macroblock partition. The referencepicture indices are either coded in the bitstream or may be derived,such as for skipped or direct mode macroblocks. A single weightingfactor and single offset are associated with each reference pictureindex for all of the slices of the current picture. In explicit mode,these parameters are coded in the slice header. In implicit mode, theseparameters are derived. The weighting factors and offset parametervalues are constrained to allow for 16-bit arithmetic operations in theinter-prediction process. The encoder may select either implicit mode orexplicit mode for each coded picture.

JVT bi-predictive or “B” pictures allow adaptive weighting between thetwo predictions, i.e., Pred=[(P0)*(Pred0)]+[(P1)*(Pred1)]+D, where P0and P1 are weighting factors, Pred0 and Pred1 are the reference picturepredictions for List 0 and List 1 respectively, and D is an offset.

As shown in FIG. 1, a standard video encoder is indicated generally bythe reference numeral 100. An input to the encoder 100 is connected insignal communication with a non-inverting input of a summing junction110. The output of the summing junction 110 is connected in signalcommunication with a block transform function 120. The transform 120 isconnected in signal communication with a quantizer 130. The output ofthe quantizer 130 is connected in signal communication with a variablelength coder (“VLC”) 140, where the output of the VLC 140 is anexternally available output of the encoder 100.

The output of the quantizer 130 is further connected in signalcommunication with an inverse quantizer 150. The inverse quantizer 150is connected in signal communication with an inverse block transformer160, which, in turn, is connected in signal communication with areference picture store 170. A first output of the reference picturestore 170 is connected in signal communication with a first input of amotion estimator 180. The input to the encoder 100 is further connectedin signal communication with a second input of the motion estimator 180.The output of the motion estimator 180 is connected in signalcommunication with a first input of a motion compensator 190. A secondoutput of the reference picture store 170 is connected in signalcommunication with a second input of the motion compensator 190. Theoutput of the motion compensator 190 is connected in signalcommunication with an inverting input of the summing junction 110.

Turning to FIG. 2, a video encoder with implicit reference pictureweighting is indicated generally by the reference numeral 200. An inputto the encoder 200 is connected in signal communication with anon-inverting input of a summing junction 210. The output of the summingjunction 210 is connected in signal communication with a blocktransformer 220. The transformer 220 is connected in signalcommunication with a quantizer 230. The output of the quantizer 230 isconnected in signal communication with a VLC 240, where the output ofthe VLC 240 is an externally available output of the encoder 200.

The output of the quantizer 230 is further connected in signalcommunication with an inverse quantizer 250. The inverse quantizer 250is connected in signal communication with an inverse block transformer260, which, in turn, is connected in signal communication with areference picture store 270. A first output of the reference picturestore 270 is connected in signal communication with a first input of areference picture weighting factor assignor 272. The input to theencoder 200 is further connected in signal communication with a secondinput of the reference picture weighting factor assignor 272. A secondoutput of the reference picture store 270 is connected in signalcommunication with a second input of the motion estimator 280.

The input to the encoder 200 is further connected in signalcommunication with a third input of the motion estimator 280. The outputof the motion estimator 280, which is indicative of motion vectors, isconnected in signal communication with a first input of a motioncompensator 290. A third output of the reference picture store 270 isconnected in signal communication with a second input of the motioncompensator 290. The output of the motion compensator 290, which isindicative of a motion compensated reference picture, is connected insignal communication with a first input of a multiplier or referencepicture weighting applicator 292. Although an exemplary multiplierembodiment is shown, the reference picture weighting applicator 292 maybe implemented in alternate ways, such as, for example, by a shiftregister. The output of the reference picture weighting factor assignor272, which is indicative of a weighting factor, is connected in signalcommunication with a second input of the reference picture weightingapplicator 292. The output of the reference picture weighting applicator292 is connected in signal communication with an inverting input of thesumming junction 210.

Turning to FIG. 3, a video encoder with explicit reference pictureweighting is indicated generally by the reference numeral 300. An inputto the encoder 300 is connected in signal communication with anon-inverting input of a summing junction 310. The output of the summingjunction 310 is connected in signal communication with a blocktransformer 320. The transformer 320 is connected in signalcommunication with a quantizer 330. The output of the quantizer 330 isconnected in signal communication with a VLC 340, where the output ofthe VLC 340 is an externally available output of the encoder 300.

The output of the quantizer 330 is further connected in signalcommunication with an inverse quantizer 350. The inverse quantizer 350is connected in signal communication with an inverse block transformer360, which, in turn, is connected in signal communication with areference picture store 370. A first output of the reference picturestore 370 is connected in signal communication with a first input of areference picture weighting factor assignor 372. The input to theencoder 300 is further connected in signal communication with a secondinput of the reference picture weighting factor assignor 372. A firstoutput of the reference picture weighting factor assignor 372, which isindicative of a weighting factor, is connected in signal communicationwith a first input of a motion estimator 380. A second output of thereference picture store 370 is connected in signal communication with asecond input of the motion estimator 380.

The input to the encoder 300 is further connected in signalcommunication with a third input of the motion estimator 380. The outputof the motion estimator 380, which is indicative of motion vectors, isconnected in signal communication with a first input of a motioncompensator 390. A third output of the reference picture store 370 isconnected in signal communication with a second input of the motioncompensator 390. The output of the motion compensator 390, which isindicative of a motion compensated reference picture, is connected insignal communication with a first input of a multiplier or referencepicture weighting applicator 392. A second output of the referencepicture weighting factor assignor 372, which is indicative of aweighting factor, is connected in signal communication with a secondinput of the reference picture weighting applicator 392. The output ofthe reference picture weighting applicator 392 is connected in signalcommunication with a first non-inverting input of a summing junction394. A third output of the reference picture weighting factor assignor372, which is indicative of art offset, is connected in signalcommunication with a second non-inverting input of the summing junction394. The output of the summing junction 394 is connected in signalcommunication with an inverting input of the summing junction 310.

As shown in FIG. 4, a video decoder for explicit reference pictureweighting is indicated generally by the reference numeral 500. The videodecoder 500 includes a variable length decoder (“VLD”) 510 connected insignal communication with an inverse quantizer 520. The inversequantizer 520 is connected in signal communication with an inversetransformer 530. The inverse transformer 530 is connected in signalcommunication with a first input terminal of a summing junction 540,where the output of the summing junction 540 provides the output of thevideo decoder 500. The output of the summing junction 540 is connectedin signal communication with a reference picture store 550. Thereference picture store 550 is connected in signal communication with amotion compensator 560, which is connected in signal communication witha first input of a multiplier or reference picture weighting applicator570. As will be recognized by those of ordinary skill in the pertinentart, the decoder 500 for explicit weighted prediction may also be usedfor implicit weighted prediction.

The VLD 510 is further connected in signal communication with areference picture weighting factor lookup 580 for providing acoefficient index to the lookup 580. A first output of the lookup 580 isfor providing a weighting factor, and is connected in signalcommunication to a second input of the reference picture weightingapplicator 570. The output of the reference picture weighting applicator570 is connected in signal communication to a first input of a summingjunction 590. A second output of the lookup 580 is for providing anoffset, and is connected in signal communication to a second input ofthe summing junction 590. The output of the summing junction 590 isconnected in signal communication with a second input terminal of thesumming junction 540.

As shown in FIG. 5, a picture cross-fade is indicated generally by thereference numeral 600. The exemplary picture cross-fade 600 includes afade-out or starting picture 610, identified as FP0, and a fade-in orending picture 612, identified as FP1.

Turning now to FIG. 6, an exemplary process for encoding video signaldata for an image block is indicated generally by the reference numeral700. The process 700 is implemented with an encoder, such as the encoder200 or 300 of FIGS. 2 and 3, respectively. The process 700 includes astart block 710 that passes control to a decision block 712. Thedecision block 712 determines whether a cross-fade is present, and, ifnone is present, passes control to a function block 713. The functionblock 713 performs normal encoding and passes control to an end block724.

However, if the decision block 712 finds a cross-fade, it passes controlto a function block 714. The function block 714 finds the fade-outstarting point, FP0, and passes control to a function block 716, whichfinds the fade-in ending point FP1. The block 716 passes control to afunction block 718, which codes the fade-out start picture FP0 andpasses control to a function block 720. The block 720 codes the fade-inend picture FP1 and passes control to a function block 722.

The function block 722, in turn, codes pictures disposed in displayorder between FP0 and FP1, using weighted prediction with the pictureFP0 as the list 0 reference and the picture FP1 as the list 1 reference.The function block 722 passes control to the end block 724.

An authoring tool used for video cross-fades between a pair of picturesincludes a video encoder, such as the encoder 200 of FIG. 2, andoperates on pre-stored video content. In addition to the uncompressedvideo content, some additional information may be available such asdecision lists and editing splice points. The video encoder in anauthoring tool does not necessarily need to operate in real time.Special effects such as fades and cross-fades can be applied in theauthoring tool.

Various techniques are well known for detecting fades and cross-fades,also known as dissolves, in video sequences. When encoding a particularpicture, for each macroblock or macroblock partition, a JVT encoder mustselect a coding decision mode, one or two reference pictures, and one ormore motion vectors. When a JVT encoder uses weighted prediction, onceper picture or slice it may also select a weighting factor to be appliedfor each reference index used. One or more reference indices refer toeach allowable reference picture, so multiple weights can be used foreach individual reference picture.

The authoring tool detects when a cross-fade is taking place. Theauthoring tool has sufficient information to detect when a cross-fade istaking place either because it applied the cross-fade itself, or becauseit read it from a decision list, or because it employs a fade detectionalgorithm. For a cross-fade, a picture identified as the fade-outstarting point is identified as FP0 and the fade-in ending point pictureis identified as FP1. When a cross-fade is detected, the encoder codespictures FP0 and FP1 prior to coding the pictures between FP0 and FP1 indisplay order, which are referred to as the cross-fade pictures. Thus, afeature of the present invention is that the fade-in end picture, FP1,is coded before the intermediate pictures.

It is common in video encoders to use a fixed pattern of the I, P and Bpicture coding types, and for the coding order to differ from thedisplay order. For example, such a common pattern might comprise:

-   -   Common Coding Order: I0 P3 B1 B2 P6 B4 B5 P9 B7 B8    -   Common Display Order: I0 B1 B2 P3 B4 B5 P6 B7 B8 P9

For this common pattern, picture P3 is coded before the intermediate B1and B2 pictures. The B1 and B2 pictures use I0 and P3 as referencepicture in its prediction process.

The JVT standard does not require the use of fixed picture coding typepatterns, and does not suggest methods by which an encoder can adjustthe patterns to maximize coding efficiency. In accordance with thecurrent invention, coding efficiency of cross-fading sequences can beimproved by adjusting picture coding type and coding order. If, forexample, picture 0 and picture 9 were identified as the fade-in startand fade-out end pictures, respectively, the following coding anddisplay order could be used:

-   -   Inventive Coding Order: I0 P9 B1 B2 B3 B4 B5 B6 B7 B8    -   Inventive Display Order: I0 B1 B2 B3 B4 B5 B6 B7 B8 P9

When a cross-fade picture is encoded, the encoder orders the referencepicture lists, using reference picture selection reordering ifnecessary, such that FP0 is the first picture on List 0 and FP1 is thefirst picture on List 1. This provides additional coding efficiency,because the reference index of 0, which refers to the first picture inthe reference picture list, can be coded using a lesser number of bitsthan other reference indices. Then a weighting factor is selected forthe reference indices corresponding to each of FP0 and FP1, based on therelative contribution of the first picture and the second picture in thecomposition of the current picture. If the formula used in creating thecross-fade picture is known, either because the authoring tool createdthe cross-fade, or from side-information, then the weighting factor fromthe composition formula can be used. If the exact formula is not known,a weighting factor can be computed using any of several differentalgorithms, such as those based on relative distance of the currentpicture from FP0 and FP1, for example.

This above described algorithm can be applied for all coded pictures inthe cross-fade region, or may be applied only for those pictures thatare marked to be stored as reference pictures. In alternate embodiments,either implicit mode or explicit mode weighted prediction may be used tocode the cross-fade pictures. When explicit mode is used, any weightingfactors may be used. When implicit mode is used, the weighting factorsdepend on the relative distance of the current picture from FP0 and FP1.

This system and technique may be applied to either Predictive “P”pictures, which are encoded with a single predictor, or to Bi-predictive“B” pictures, which are encoded with two predictors. The decodingprocesses, which are present in both encoder and decoders, are describedbelow for the P and B picture cases. Alternatively, this technique mayalso be applied to coding systems using the concepts similar to I, B,and P pictures.

The same weighting factors can be used for single directional predictionin B pictures and for bi-directional prediction in B pictures. When asingle predictor is used for a macroblock, in P pictures or for singledirectional prediction in B pictures, a single reference picture indexis transmitted for the block. After the decoding process step of motioncompensation produces a predictor, the weighting factor is applied topredictor. The weighted predictor is then added to the coded residual,and clipping is performed on the sum, to form the decoded picture. Foruse for blocks in P pictures or for blocks in B pictures that use onlyList 0 prediction, the weighted predictor is formed as:Pred=W0*Pred0+D0  (1)

where W0 is the weighting factor associated with the List 0 referencepicture, D0 is the offset associated with the List 0 reference picture,and Pred0 is the motion-compensated prediction block from the List 0reference picture.

For use for blocks in B pictures that use only List 1 prediction, theweighted predictor is formed as:Pred=W1*Pred1+D1  (2)

where W1 is the weighting factor associated with the List 1 referencepicture, D1 is the offset associated with the List 1 reference picture,and Pred1 is the motion-compensated prediction block from the List 1reference picture.

The weighted predictors may be clipped to guarantee that the resultingvalues will be within the allowable range of pixel values, typically 0to 255. The precision of the multiplication in the weighting formulasmay be limited to any pre-determined number of bits of resolution.

In the bi-predictive case, reference picture indexes are transmitted foreach of the two predictors. Motion compensation is performed to form thetwo predictors. Each predictor uses the weighting factor associated withits reference picture index to form two weighted predictors. The twoweighted predictors are then averaged together to form an averagedpredictor, which is then added to the coded residual.

For use for blocks in B pictures that use List 0 and List 1 predictions,the weighted predictor is formed as:Pred=(P0*Pred0+D0+P1*Pred1+D1)/2  (3)

Clipping may be applied to the weighted predictor or any of theintermediate values in the calculation of the weighted predictor toguarantee that the resulting values will be within the allowable rangeof pixel values, typically 0 to 255.

Thus, a weighting factor is applied to the reference picture predictionof a video compression encoder and decoder that uses multiple referencepictures. The weighting factor adapts for individual motion blockswithin a picture, based on the reference picture index that is used forthat motion block. Because the reference picture index is alreadytransmitted in the compressed video bitstream, the additional overheadto adapt the weighting factor on a motion block basis is dramaticallyreduced. All motion blocks that are coded with respect to the samereference picture apply the same weighting factor to the referencepicture prediction.

In the Joint Model (“JM”) software of the JVT committee, an a posteriorimethod using rate distortion optimization is used for selection ofmotion vectors, macroblock partitioning, prediction mode, and referencepicture indices. In this method, a range of allowable values for each ofthese choices is tested and a cost is determined for each choice. Thechoice that leads to the minimum cost is selected.

Motion estimation techniques have been widely studied. For each motionblock of a picture being coded, a motion vector is chosen thatrepresents a displacement of the motion block from a reference picture.In an exhaustive search method within a search region, everydisplacement within a pre-determined range of offsets relative to themotion block position is tested. The test includes calculating the sumof the absolute difference (“SAD”) or mean squared error (“MSE”) of eachpixel in the motion block in the current picture with the displacedmotion block in a reference picture. The offset with the lowest SAD orMSE is selected as the motion vector. Numerous variations on thistechnique have been proposed, such as three-step search andrate-distortion optimized motion estimation, all of which include thestep of computing the SAD or MSE of the current motion block with adisplaced motion block in a reference picture.

Computational costs for determining motion vectors and adaptivereference picture weighting factors can be reduced by using an iterativeprocess, while still selecting motion vectors and weighting factors thatare able to achieve high compression efficiencies. An exemplaryembodiment motion vector and weighting factor determination process isdescribed assuming that a single weighting factor is applied to theentire reference picture, although the principles of the inventionshould not be construed as being so limited. The process could also beapplied over smaller regions of the picture, such as slices, forexample. In addition, although one exemplary embodiment is described asusing only a single reference picture, the principles may also beapplied to multiple reference picture prediction and to bi-predictivepictures.

Calculation of the motion vector for a motion block can typically bestbe done when the weighting factor to be used is known. In an exemplaryembodiment, an estimate of the weighting factor is formed, using thereference picture and the current picture pixel values. The weightingfactor may be limited to a number of bits of resolution. If theweighting factor is very close to 1, there is no need to consider theweighting factor in the motion estimation process, and normal motionestimation can be done with the weighting factor assumed to be equalto 1. Otherwise, the weighting factor estimate is applied to thereference picture. Motion estimation is then performed using any methodwhich calculates SAD or MSE, but with the SAD or MSE calculationperformed between the current picture motion block and the displacedmotion block in the weighted version of the reference picture, ratherthan the un-weighted reference picture. The estimation of the weightingfactor can be refined after the motion vectors have been selected, ifnecessary.

The current motion vectors are applied to the weighted reference pictureto form the weighted, motion compensated reference picture. A differencemeasure between the weighted, motion compensated reference picture andthe current picture is computed. If the difference measure is lower thana threshold, or lower than the previous best difference measure, theprocess is complete, and the current candidate motion vectors andweighting factor are accepted.

If the difference measure is higher than some threshold, the weightingfactor can be refined. In this case, a motion compensated butun-weighted reference picture is formed based on the current candidatemotion vectors. The weighting factor estimate is refined using themotion compensated reference picture and the current picture, ratherthan using the un-compensated reference picture, as was done in formingthe initial estimate of the weighting factor.

In one embodiment, the initial estimate of the weighting factor, w, isthe ratio between the average value of the pixels in the currentpicture, cur, divided by the average value of the pixels in thereference picture, ref, where:w=avg(cur)/avg(ref)  (4)

The refinement estimates are the ratio between the average of pixels inthe current picture and the average of pixels in the motion compensatedreference picture, mcref, where:w=avg(cur)/avg(mcref)  (5)

The difference measure diff is the absolute value of the average ofpixel differences between the current picture, cur, and the weightedmotion compensated reference picture, wmcref, where:diff=|Σcur−wmcref|  (6)

In another embodiment, the difference measure is the sum of the absolutedifferences of the pixels in the current picture and in the weighedmotion compensated reference picture, where:diff=|cur−wmcref|  (7)

When block-based motion estimation is performed, the same pixel in areference picture is used for numerous SAD calculations. In an exemplaryembodiment during the motion estimation process, once a weighting factorhas been applied to a pixel in a reference picture, the weighted pixelis stored, in addition to the normal pixel. The storage may be doneeither for a region of the picture, or for the entire picture.

The weighted reference picture values may be clipped to be stored withthe same number of bits as an unweighted reference, such as 8 bits, forexample, or may be stored using more bits. If clipping is performed forthe motion compensation process, which is more memory efficient, theweighting factor is reapplied to the reference picture for the actualselected motion vector, the difference is calculated using additionalbits, and the clipping is performed after the difference in order toavoid mismatch with a decoder, which might otherwise occur if thedecoder does not perform clipping after the weighting factor is applied.

When multiple reference pictures are used to encode a picture, aseparate weighting factor can be calculated for each reference picture.During motion estimation, a motion vector and a reference picture indexare selected for each motion block. For each iteration of the process,motion vectors and weighting factors are found for each referencepicture.

In a preferred embodiment, during motion estimation, the best referencepicture for a given motion block is determined. Calculation of thedifference measure is done separately for each reference picture, withonly those motion blocks that use that reference picture being used inthe calculation. Refinement of the weighting factor estimate for a givenreference picture also uses only those motion blocks that are codedusing that reference picture. For bi-predictive coding, weightingfactors and motion vectors can be determined separately for each of thetwo predictions, which will be averaged together to form the averagedprediction.

The principles of the present invention can be applied to many differenttypes of motion estimation algorithms. When used with hierarchicalapproaches, the iteration of weighting factor selection and motionvector selection can be used with any level of the motion estimationhierarchy. For example, the iterative approach could be used withinteger picture element (“pel”) motion estimation. After the weightingfactor and integer motion vectors are found using the provided iterativealgorithm, the sub-pel motion vectors may be found without requiringanother iteration of the weighting factor selection.

These and other features and advantages of the present invention may bereadily ascertained by one of ordinary skill in the pertinent art basedon the teachings herein. It is to be understood that the principles ofthe present invention may be implemented in various forms of hardware,software, firmware, special purpose processors, or combinations thereof.

Most preferably, the principles of the present invention are implementedas a combination of hardware and software. Moreover, the software ispreferably implemented as an application program tangibly embodied on aprogram storage unit. The application program may be uploaded to, andexecuted by, a machine comprising any suitable architecture. Preferably,the machine is implemented on a computer platform having hardware suchas one or more central processing units (“CPU”), a random access memory(“RAM”), and input/output (“I/O”) interfaces. The computer platform mayalso include an operating system and microinstruction code. The variousprocesses and functions described herein may be either part of themicroinstruction code or part of the application program, or anycombination thereof, which may be executed by a CPU. In addition,various other peripheral units may be connected to the computer platformsuch as an additional data storage unit and a printing unit.

It is to be further understood that, because some of the constituentsystem components and methods depicted in the accompanying drawings arepreferably implemented in software, the actual connections between thesystem components or the process function blocks may differ dependingupon the manner in which the present invention is programmed. Given theteachings herein, one of ordinary skill in the pertinent art will beable to contemplate these and similar implementations or configurationsof the present invention.

Although the illustrative embodiments have been described herein withreference to the accompanying drawings, it is to be understood that thepresent invention is not limited to those precise embodiments, and thatvarious changes and modifications may be effected therein by one ofordinary skill in the pertinent art without departing from the scope orspirit of the present invention. All such changes and modifications areintended to be included within the scope of the present invention as setforth in the appended claims.

1. A video encoder for encoding video signal data for at least onecross-fade picture disposed temporally between a fade-out start pictureand a fade-in end picture, which are used as reference pictures forcoding the at least one cross-fade picture, the encoder comprising: areference picture weighting applicator; and a reference pictureweighting factor unit in signal communication with the reference pictureweighting applicator for assigning weighting factors corresponding toeach of the fade-out start picture and the fade-in end picture,respectively, for coding the at least one cross-fade picture.
 2. A videoencoder as defined in claim 1, further comprising a motion compensationunit in signal communication with the reference picture weightingapplicator for providing at least one of a motion compensated fade-outstart picture and a motion compensated fade-in end picture responsive tothe reference picture weighting factor unit for coding the at least onecross-fade picture.
 3. A video encoder as defined in claim 2, furthercomprising a reference picture store in signal communication with eachof the reference picture weighting factor unit and the motioncompensation unit for storing each of the fade-out start picture and thefade-in end picture.
 4. A video encoder as defined in claim 2 whereinthe reference picture weighting applicator applies a weighting factorselected by the reference picture weighting factor unit to at least oneof the motion compensated fade-out start picture and the motioncompensated fade-in end picture.
 5. A video encoder as defined in claim4 usable with bi-predictive picture predictors, the encoder furthercomprising prediction means for forming first and second predictors fromthe weighted and motion compensated fade-out start and fade-in endpictures, respectively.
 6. A video encoder as defined in claim 5 whereinthe weighted and motion compensated fade-out start and fade-in endpictures, respectively, are each from opposite directions relative toall of the at least one cross-fade pictures.
 7. A video encoder asdefined in claim 1, further comprising a motion estimation unit insignal communication with the reference picture weighting factor unitfor providing motion estimation responsive to weighting factor in anexplicit mode of operation.
 8. A video encoder as defined in claim 2,further comprising a summing unit in signal communication with thereference picture weighting factor unit for applying an offset to theweighted motion compensated reference picture in an explicit mode ofoperation.
 9. A method for encoding cross-fades between pictures, themethod comprising: identifying pictures for which a cross-fade isdefined; determining appropriate end-points from pictures for which saidcross-fade is defined; and encoding said end-points prior to encoding atleast one picture intermediate to said end-points.
 10. A method asdefined in claim 9 wherein said end-points from pictures for which saidcross-fade is defined are used as reference pictures when encoding atleast one picture intermediate to said end-points.
 11. A method asdefined in claim 9, further comprising: receiving a substantiallyuncompressed fade-out start picture; receiving a substantiallyuncompressed fade-in end picture; assigning a weighting factor for theat least one-picture corresponding to the fade-out start picture; andassigning a weighting factor for the at least one-picture correspondingto the fade-in end picture.
 12. A method as defined in claim 11, furthercomprising: computing motion vectors corresponding to the differencebetween the at least one cross-fade picture and at least one of thefade-out start picture and the fade-in end picture; motion compensatingthe at least one of the fade-out start picture and the fade-in endpicture in correspondence with the motion vectors; multiplying themotion compensated at least one of the fade-out start picture and thefade-in end picture by the assigned weighting factor, respectively, toform at least one weighted motion compensated reference picture; andsubtracting the at least one weighted motion compensated referencepicture from the at least one cross-fade picture; and encoding a signalindicative of the difference between the at least one cross-fade pictureand the at least one weighted motion compensated reference picture. 13.A method as defined in claim 12 wherein exactly two reference picturesare used, the exactly two reference pictures comprising the pre-codedfade-out start picture, FP0, and the fade-in end picture, FP1.
 14. Amethod as defined in claim 13, further comprising: combining the motioncompensated fade-out start picture with the motion compensated fade-inend picture prior to subtracting from the at least one cross-fadepicture.
 15. A method as defined in claim 12 wherein computing motionvectors comprises: testing within a search region for every displacementwithin a pre-determined range of offsets relative to the at least onecross-fade picture; calculating at least one of the sum of the absolutedifference and the mean squared error of each pixel in the at least onecross-fade picture with a motion compensated reference picture; andselecting the offset with the lowest sum of the absolute difference andmean squared error as the motion vector.
 16. A method as defined inclaim 12 wherein computing motion vectors comprises: testing within asearch region for every displacement within a pre-determined range ofoffsets relative to the at least one cross-fade picture; calculating atleast one of the sum of the absolute difference and the mean squarederror of each pixel in the at least one cross-fade picture with a firstmotion compensated reference picture corresponding to the fade-out startpicture; selecting an offset with the lowest sum of the absolutedifference and mean squared error as the motion vector for the fade-outstart picture; calculating at least one of the sum of the absolutedifference and the mean squared error of each pixel in the image blockwith a second motion compensated reference picture corresponding to thefade-in end picture; and selecting an offset with the lowest sum of theabsolute difference and mean squared error as the motion vector for thefade-in end picture.
 17. A method as defined in claim 11 wherein theweighting factors for the fade-out start picture and the fade-in endpicture, respectively, are each responsive to the relative distancebetween the at least one cross-fade picture and the fade-out startpicture or the fade-in end picture, respectively, in an implicit mode ofoperation.
 18. A video CODEC comprising an encoder as defined in claim 1and a decoder for decoding video signal data for a cross-fade picturerelative to each of a fade-out start picture and a fade-in end pictureto predict the cross-fade picture, the decoder comprising a referencepicture weighting factor unit having an output for determining weightingfactors corresponding to each of the fade-out start picture and thefade-in end picture.
 19. A video CODEC as defined in claim 18 whereinthe reference picture weighting factor unit has a second output fordetermining offsets corresponding to each of the fade-out start pictureand the fade-in end picture.
 20. A video CODEC as defined in claim 18,further comprising a variable length decoder in signal communicationwith the reference picture weighting factor unit for providing indicescorresponding to each of the fade-out start picture and the fade-in endpicture to the reference picture weighting factor unit.
 21. A videoCODEC as defined in claim 18, further comprising a motion compensator insignal communication with the reference picture weighting factor unitfor providing motion compensated reference pictures responsive to thereference picture weighting factor unit.
 22. A video CODEC as defined inclaim 21, further comprising a reference picture weighting applicator insignal communication with the motion compensator and the referencepicture weighting factor unit for applying a weighting factor to eachmotion compensated reference picture.
 23. A video CODEC as defined inclaim 21, further comprising an adder in signal communication with themotion compensator and the reference picture weighting factor unit forapplying an offset to each motion compensated reference picture.
 24. Avideo CODEC as defined in claim 18 wherein the video signal data isstreaming video signal data comprising block transform coefficients. 25.A video CODEC as defined in claim 18 usable with bi-predictive picturepredictors, the decoder further comprising: prediction means for formingfirst and second predictors from two different reference pictures;averaging means for averaging the first and second predictors togetherusing their corresponding weighting factors to form a single averagedpredictor.